analog sgd
- North America > United States > New York > Rensselaer County > Troy (0.04)
- Europe > France (0.04)
- North America > United States > New York > Rensselaer County > Troy (0.04)
- Europe > France (0.04)
In-memory Training on Analog Devices with Limited Conductance States via Multi-tile Residual Learning
Li, Jindan, Wu, Zhaoxian, Liu, Gaowen, Gokmen, Tayfun, Chen, Tianyi
Analog in-memory computing (AIMC) accelerators enable efficient deep neural network computation directly within memory using resistive crossbar arrays, where model parameters are represented by the conductance states of memristive devices. However, effective in-memory training typically requires at least 8-bit conductance states to match digital baselines. Realizing such fine-grained states is costly and often requires complex noise mitigation techniques that increase circuit complexity and energy consumption. In practice, many promising memristive devices such as ReRAM offer only about 4-bit resolution due to fabrication constraints, and this limited update precision substantially degrades training accuracy. To enable on-chip training with these limited-state devices, this paper proposes a \emph{residual learning} framework that sequentially learns on multiple crossbar tiles to compensate the residual errors from low-precision weight updates. Our theoretical analysis shows that the optimality gap shrinks with the number of tiles and achieves a linear convergence rate. Experiments on standard image classification benchmarks demonstrate that our method consistently outperforms state-of-the-art in-memory analog training strategies under limited-state settings, while incurring only moderate hardware overhead as confirmed by our cost analysis.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
- (10 more...)
- Energy (0.66)
- Semiconductors & Electronics (0.64)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Pipeline Gradient-based Model Training on Analog In-memory Accelerators
Wu, Zhaoxian, Xiao, Quan, Gokmen, Tayfun, Tsai, Hsinyu, Maghraoui, Kaoutar El, Chen, Tianyi
Aiming to accelerate the training of large deep neural models in an energy-efficient way, an analog in-memory computing (AIMC) accelerator emerges as a solution with immense potential. In AIMC accelerators, trainable weights are kept in memory without the need to move from memory to processors during the training, significantly reducing overhead. However, although the in-memory feature enables efficient computation, it also constrains the use of data parallelism since copying weights from one AIMC to another is expensive. To enable parallel training using AIMC, we propose synchronous and asynchronous pipeline parallelism for AIMC accelerators inspired by the pipeline in digital domains. This paper provides a theoretical convergence guarantee for both synchronous and asynchronous pipelines in terms of both sampling and clock cycle complexity, which is non-trivial since the physical characteristic of AIMC accelerators leads to analog updates that suffer from asymmetric bias. The simulations of training deep models on real datasets verify the efficiency of pipeline analog training.
- North America > United States > New York > Rensselaer County > Troy (0.04)
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Towards Exact Gradient-based Training on Analog In-memory Computing
Wu, Zhaoxian, Gokmen, Tayfun, Rasch, Malte J., Chen, Tianyi
Given the high economic and environmental costs of using large vision or language models, analog in-memory accelerators present a promising solution for energy-efficient AI. While inference on analog accelerators has been studied recently, the training perspective is underexplored. Recent studies have shown that the "workhorse" of digital AI training - stochastic gradient descent (SGD) algorithm converges inexactly when applied to model training on non-ideal devices. This paper puts forth a theoretical foundation for gradient-based training on analog devices. We begin by characterizing the non-convergent issue of SGD, which is caused by the asymmetric updates on the analog devices. We then provide a lower bound of the asymptotic error to show that there is a fundamental performance limit of SGD-based analog training rather than an artifact of our analysis. To address this issue, we study a heuristic analog algorithm called Tiki-Taka that has recently exhibited superior empirical performance compared to SGD and rigorously show its ability to exactly converge to a critical point and hence eliminates the asymptotic error. The simulations verify the correctness of the analyses.
- North America > United States > New York > Rensselaer County > Troy (0.04)
- Europe > France (0.04)